In this vignette, we explore how to label your track files (activity and pressure) and provide tips to make the exercise more efficient. To see where this exercise fits in with the overall process, see the vignette How to use GeoPressureR.
library(GeoPressureR)
library(raster)
library(plotly)
library(RColorBrewer)
pam_data <- pam_read(
pathname = system.file("extdata", package = "GeoPressureR"),
crop_start = "2017-06-20", crop_end = "2018-05-02"
)Labelling principles
Labelling (manually) your tracks is imperative because geopressure requires highly precise and well-defined pressure timeseries of a fixed/constant locoation both in horizontal (geographical: +/- 10-50km) and vertical (altitude: +/- 2m).
The procedure involves labelling (1) activity data when the bird is migrating and (2) identifying pressure datapoints to be discarded from the matching exercise.
Activity labelling defines stationary periods and flight duration. A stationary periods is defined by period during which the bird is considered static relative to the size of the grid (~10-50km). The start and end of the stationary period is then used to define the pressure timeseries to be matched. Since flight duration is the key input in the movement model, having an accurate flight duration is critical to correctly estimate the distance traveled by the bird between two stationary periods.
Pressure labelling allows to eliminate vertical (altitudinal) movement of the bird. The pressure timeseries matching algorithm is sensitive to pressure variation of a few hPa, such that even a altitudinal movement of a couple of meters can throw off the estimation map for short stationary period. Since the reanalysis data to be matched is provided at a single pressure level, we must discard all data points fromt the geolocator pressure data corresponding to a different elevation.
Each species’ migration behaviour is so specific that manual editing remains the fastest option. Indeed, small movement corresponding to small change of pressure and high activity can correspond to local movement (birds essentially stays at the same location) or slow migration. Expertise on your bird expected migration style will be essential to correctly label your tracks.
Manual editing also provides a sense of what the bird is doing. You will learn how the bird is moving (e.g. long continuous high altitude flight, short flights over multiple days, alternation between short migration flights and stopovers, etc.). It also provides a sense of the uncertainty of your classification, which is useful to understand and interpret your results.
That being said, it is still worth starting the manual editing from an automatically labeled timeseries. pam_classify() defines migratory flight when activity is high for a long period. Refer to possible classification methods on the PAMLr manual.
pam_data <- pam_classify(pam_data, min_duration = 30)
trainset_write(pam_data, pathname = system.file("extdata", package = "GeoPressureR"), filename = "18LX_act_pres-labeled-v1.csv")Finally, labeling is an iterative process where you will need to check the validity of the pressure timeseries at stationary period against the reanalysis data (more on this later). You can expect to spend between 30sec (e.g. Mangrove Kingfisher) to 10min (e.g. Eurasian Nightjar) per track depending on the species’ migrating complexity.
Introduction to TRAINSET
We are suggesting to use TRAINSET, a web based graphical tool for labelling time series. You can read more about TRAINSET on www.trainset.geocene.com and on their Github repository.
The tool interface is quite intuitive. Start by uploading your .csv file (e.g., 18IC_act_pres.csv).

Initial view of TRAINSET after uploading a file
A few tips:
-
Keyboard shortcuts can considerably speed up navigation (zoom in/out, move left/right) and labelling (add/remove a label), specifically with
SHIFT. - Because of the large number of datapoints, keeping a narrow temporal window will avoid your browser from becoming slow or irresponsive.
- Change the
Active SetiesandReference Seriesdepending on what you are labeling but use both timeseries at the same time to figure out what the bird might be doing. - Adapt with the y-axis range to each stationary period to properly see the small (but essential) pressure variations which are not visible in the full view
- TRAINSET offers more flexibility with the label than required: you can add and remove label values (bottom-right of the page). In order for
trainset_read()to work, do not change/edit/add any label, simply use the ones offered :TRUEandFALSE.
Four tests to check labelling
To improve and evaluate the quality of your labeling, you can use these four tests.
Test 1: Duration of stationary periods and flights
The first test consists in checking the duration of flights and stationary periods.
pam_data <- trainset_read(pam_data, system.file("extdata", package = "GeoPressureR"), filename = "18LX_act_pres-labeled-v1.csv")
pam_data <- pam_sta(pam_data)
pam_data$sta$duration <- difftime(pam_data$sta$end, pam_data$sta$start, units = "days")
pam_data$sta$next_flight_duration <- c(difftime(tail(pam_data$sta$start, length(pam_data$sta$start) - 1),
head(pam_data$sta$end, length(pam_data$sta$end) - 1),
units = "hours"
), 0)
knitr::kable(subset(pam_data$sta, duration < 0.25 | next_flight_duration < 1))| sta_id | start | end | duration | next_flight_duration | |
|---|---|---|---|---|---|
| 7 | 7 | 2017-08-30 23:45:00 | 2017-08-30 23:55:00 | 0.0069444 days | 4.2500000 hours |
| 14 | 14 | 2017-09-11 02:45:00 | 2017-09-11 23:35:00 | 0.8680556 days | 0.7500000 hours |
| 26 | 26 | 2018-04-15 14:55:00 | 2018-04-15 18:50:00 | 0.1631944 days | 0.6666667 hours |
| 27 | 27 | 2018-04-15 19:30:00 | 2018-04-15 20:10:00 | 0.0277778 days | 1.4166667 hours |
| 30 | 30 | 2018-04-29 23:35:00 | 2018-04-29 23:45:00 | 0.0069444 days | 2.8333333 hours |
| 31 | 31 | 2018-04-30 02:35:00 | 2018-04-30 18:40:00 | 0.6701389 days | 0.6666667 hours |
| 32 | 32 | 2018-04-30 19:20:00 | 2018-04-30 19:40:00 | 0.0138889 days | 2.0833333 hours |
| 33 | 33 | 2018-04-30 21:45:00 | 2018-04-30 21:55:00 | 0.0069444 days | 1.0833333 hours |
| 34 | 34 | 2018-04-30 23:00:00 | 2018-04-30 23:10:00 | 0.0069444 days | 0.8333333 hours |
| 35 | 35 | 2018-05-01 00:00:00 | 2018-05-01 00:10:00 | 0.0069444 days | 0.5833333 hours |
| 36 | 36 | 2018-05-01 00:45:00 | 2018-05-01 23:55:00 | 0.9652778 days | 0.0000000 hours |
Depending on your specific species, you may want to check the activity labelling of short flight (<1-2hr) as well as the activity labelling before and after short stationary periods (<1-10 hours). Note that the last row has a next_flight_duration of 0 because it is the last stationary period. Repeat this test until your are satisfy with the result.
Test 2: Pressure timeseries
In the second check, we visually inspect that the pressure timeseries of each stationary period are (1) correctly groups and (2) do not includes pressure outliar (altitudinal movement).
pam_data <- trainset_read(pam_data, system.file("extdata", package = "GeoPressureR"), filename = "18LX_act_pres-labeled-v2.csv")
pam_data <- pam_sta(pam_data)
# Set colorscale
col <- rep(RColorBrewer::brewer.pal(9, "Set1"), times = ceiling((nrow(pam_data$sta) + 1) / 9))
col <- col[1:(nrow(pam_data$sta) + 1)]
names(col) <- levels(factor(c(0, pam_data$sta$sta_id)))
pressure_na <- pam_data$pressure
pressure_na$obs[pressure_na$isoutliar | pressure_na$sta_id == 0] <- NA
p <- ggplot() +
geom_line(data = pam_data$pressure, aes(x = date, y = obs), col = "grey") +
geom_line(data = pressure_na, aes(x = date, y = obs, col = factor(sta_id))) +
geom_point(data = subset(pam_data$pressure, isoutliar), aes(x = date, y = obs), colour = "black") +
theme_bw() +
scale_colour_manual(values = col) +
scale_y_continuous(name = "Pressure (hPa)")
ggplotly(p, dynamicTicks = T) %>% layout(showlegend = F)Ploting this figure with Plotly allows you to zoom-in and pan to check all timeseries are correctly grouped. Make sure each stationary period does not include any pressure measurement from flight (e.g. 1-Sep-2017 in the figure above). You might spot some anomalies in the temporal variation of pressure. In some cases, you can already label the pressure timeseries to remove them.
Test 3: Pressure timeseries match
So far, we have checked that the pressure timeseries are correctly labeled with their respective stationary periods and that they look relatively smooth. At this stage, the timeseries are good enough to be matched with the reanalysis data. The third test consists of comparing the pressure timeseries from the geolocator to ERA5 at the location with the best match. This allows to distinguish bird movements from natural variations in pressure. This is the most difficult step, and multiple iterations will be necessary to achieve the best results.
Note that the location with the best match can be wrong for the short stationary periods. But the point of this exercise is to identify vertical movement of the bird and thus the location doesn’t really matter.
pam_data <- trainset_read(pam_data, system.file("extdata", package = "GeoPressureR"), filename = "18LX_act_pres-labeled-v3.csv")
pam_data <- pam_sta(pam_data)
sta_id_keep <- pam_data$sta$sta_id[difftime(pam_data$sta$end, pam_data$sta$start, units = "hours") > 12]
pam_data$pressure$sta_id[!(pam_data$pressure$sta_id %in% sta_id_keep)] <- NA
message("Number of stationay period to query: ", length(sta_id_keep))We can estimate the probability map for each stationary period with the following code. We will cover these fours functions in more details in the vignette Pressure Map. For each stationary period, we locate the best match and query the pressure timeseries with geopressure_ts() at this location.
pressure_maps <- geopressure_map(pam_data$pressure, extent = c(50, -16, 0, 23), scale = 10, max_sample = 100)
pressure_prob <- geopressure_prob_map(pressure_maps)
path <- geopressure_map2path(pressure_prob)
pressure_timeserie <- geopressure_ts_path(path, pam_data$pressure)We can now look at a similar figure of pressure timeseries, but this time comparing geolocator data with the best match from the reanalysis data.
p <- ggplot() +
geom_line(data = pam_data$pressure, aes(x = date, y = obs), colour = "grey") +
geom_point(data = subset(pam_data$pressure, isoutliar), aes(x = date, y = obs), colour = "black") +
geom_line(data = subset(do.call("rbind", pressure_timeserie), sta_id != 0), aes(x = date, y = pressure0, col = factor(sta_id))) +
theme_bw() +
scale_colour_manual(values = col) +
scale_y_continuous(name = "Pressure (hPa)")
ggplotly(p, dynamicTicks = T) %>% layout(showlegend = F)You can use this figure to identify periods where there is a mismatch between the geolocator and ERA5, usually indicative of altitudinal movement of the bird. Depending on the situation, there are mulitple way of labeling this mismatch. In the easier case, the bird simply flew within the same stationary site (<10-50km) for a short time and came back to the same location. In such case, you can simply label out the pressure timeserie during the temporary change of altitude. If the bird changed altitude but never came back to the same elevation, there a different way of solving this. You can either considered that the new altitude is a new stationay period and label the activity data. Otherwise, you can label out the pressure timerise of the shorter period. It is essential that the resulting pressure timeserie matches the ERA5 pressure at everywhere. Matches are usually better for the longer periods. Looking at the activity data during the same period can also help understand what the bird is doing.
In this example, removing a few more pressure datapoints can improve the match, especially for short stopover (e.g. 2017-9-11). See below for the final labeled file.
pam_data <- trainset_read(pam_data, system.file("extdata", package = "GeoPressureR"), filename = "18LX_act_pres-labeled.csv")
pam_data <- pam_sta(pam_data)
p <- ggplot() +
geom_line(data = pam_data$pressure, aes(x = date, y = obs), colour = "grey") +
geom_point(data = subset(pam_data$pressure, isoutliar), aes(x = date, y = obs), colour = "black") +
geom_line(data = subset(do.call("rbind", pressure_timeserie), sta_id != 0), aes(x = date, y = pressure0, col = factor(sta_id))) +
theme_bw() +
theme_bw() +
scale_colour_manual(values = col) +
scale_y_continuous(name = "Pressure (hPa)")
ggplotly(p, dynamicTicks = T) %>% layout(showlegend = F)Test 4: Histogram of pressure error
Finally, you can also look at the histogram of the pressure error (geolocator-ERA5). For long stationary periods (over 5 days), you want to check that there is a single mode in your distribution. Two modes indicate that the bird is spending time at two different altitudes. This is usual when birds have a day site and a night roost at different elevations. You might also want to check the spread of the distribution. This value can guide you in setting the standard deviation parameter s in geopressure_prob_map().
par(mfrow = c(5, 6), mar = c(1, 1, 3, 1))
for (i_r in seq_along(pressure_timeserie)) {
if (!is.null(pressure_timeserie[[i_r]])) {
i_s <- unique(pressure_timeserie[[i_r]]$sta_id)
df3 <- merge(pressure_timeserie[[i_r]], subset(pam_data$pressure, !isoutliar & sta_id == i_s), by = "date")
df3$error <- df3$pressure0 - df3$obs
hist(df3$error, main = i_s, xlab = "", ylab = "")
abline(v = 0, col = "red")
}
}
Common challenges and tips to address them
In the following section, we use examples to illustrate common challenges that may be encountered during manual editing, and offer suggestions on how to address them.
Outliars during flights due to low bird activity
During a flight, single activity measurements can display low activity due to e.g. short gliding flights with no flapping. The automatic labelling of activity with the KNN classifier may mislabel these points as stationary periods, as illustrated in the example below for the night of the 31st of August. A single mislabeled point can incorrectly split the flight into multiple short flights. This error is highlighted with Test #1 described above. However, birds may also display lower activity at the beginning or end of their flight, which is often misclassified, as illustrated in all three nights in the example below and would not be picked up by Test #1.

However, if the low activity happens well before the bird reaches the ground, as illustrated in the example below, the low pressure measurement of flight will be included in the stationary period. These error can sometimes be pick-up in Test #2. Yet this is worth checking all flights activity and assess on a case-by-case basis whether this such datapoints should be included in the flight or not.

Importance of zooming in before editing outliers
Anomalies in a pressure timeseries might not be obvious at first sight.

Zooming in to narrower pressure range helps to understand what is happening. In this example, we have a Tawny Pipit breeding near a mine site with a rough topography. While breeding, it looks like it is staying at a relatively constant elevation, but the sudden drop in pressure towards the end indicates that the bird has changed altitude.

In such cases, the aim is to discard all pressure datapoints recorded while the bird was at a different altitude. It may not always be obvious to distinguish temporal variation of pressure from when the bird actually changes altitude. We suggest keeping only the datapoints that you are confident with (here, the first part of the timeseries only) and running Test #3.

With a long timeseries such as this one, Test #3 will easily pick up the right location and the timeseries that you want to match. You can simply de-select the datapoints at the end of your timeseries that fit the ERA5 green line. For shorter timeseries, you might need several iterations to pick up the correct match.
Short stationary halts between flights
Interpreting bird behaviour and defining stationary periods can be difficult, for example when birds extend their migration into the day but with lower intensity, such that the end of flight is not clear.

In other cases, the bird stops for a couple of hours and then seems to be active again. This could be low-intensity migratory movement, a short break followed by more migratory flight, or landing at the stopover location, but relocating early morning with the light.

The question is whether to label these halts as stationary periods or not.
Referring to the pressure timeseries can help assess whether the bird changes location. For example, if the low activity is followed by high activity accompanied by pressure change, we can consider that the bird then changed location, and label the low activity as a stationary period.
However, the bird may also land and then complete local flights within its stopover location (with very little pressure variation), in which case we want to avoid creating two different stationary periods.
Test #3 helps ensure that no local vertical movements took place.

Mountainous species
Mountainous species display very specific behaviour with regular altitudinal changes.
This is very clear with the Ring Ouzel’s timeseries, which displays daily occurring movements, though not regular enough to make the process automatic, and sometimes changing in altitude. At this scale, it is difficult to assess the temporal variation of pressure, both the 790hPa and 900hPa pressure level might work, such that it is difficult to know which points to discard.
At this point it can help to zoom out on the time axis to see whether a certain elevation seems most common. Then proceed iteratively to keep only the datapoints at the same elevation. Test #4 is useful to ensure you did not forget any points.


The Eurasian Hoopoe presents more of a challenge as it moves continuously throughout the day, showing a more sinosoidal pattern.
This is the most challenging case as distinguishing temporal variation from altitudinal change is difficult.
Several iterations should lead to a relatively smooth pressure timeseries. Note that in order to estimate the uncertainty correctly for such cases, the standard deviation s should be increased. Thankfully, this behaviour is restricted to its breeding ground.

In some cases, finding a single timeseries is impossible, such as for the wintering site of this Ring Ouzel, never returning to the same elevation. In such cases, we discard the entire timeseries and use only the mask of absolute pressure values.

Luckily, by definition mountainous species live in specific areas, which restricts possible locations. In this case, based on previous stationary periods we can establish that the bird was in Morocco, and with such low pressure (i.e. high elevation), only the Atlas mountains fit the pressure mask.

Future improvements
A lot can be done to improve this process:
- Run TRAINSET offline
- Bypass the create/upload/read csv by running a browser session directly in R
- Build an R (shiny) equivalent of TRAINSET to be directly integrated with the R package, but unable to find a good package to label points in a Figure in R, would have to maintain it, while TRAINSET does this for free.
- Any suggestions? Write an issue on Gitub